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1. Introduction 

We discuss a selection of topics in Algebraic Statistics, mainly about Ising 
models and Markov models. Our presentation of the basics is slightly differ- 
ent from other excellent presentations of the topic and it is based on work 
in progress and on previous conference presentations, in particular our pre- 
sentations at the Second CREST-SBM International Conference Harmony 
of Grobner Bases and the Modern Industrial Society, June 28 - July 2, 2010, 
Osaka, Japan. Due to the review character of this paper, we do not have in- 
line references, but we give commented references in Bibliographical Notes 
at the end of each section. 
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2. Lattice exponential families 

Our introductory example is the Ising model from Statistical Physic. Given 
an undirected graph without loops {¥,£) we consider a collection Xy of 
±l-valued random variables on the finite sample space {X,ii). For each 
edge WW = e e 5, the ±l-valued random variable X^ = XyX^ is called an 
interaction. The exponential family of densities 



xM^, (1) 



pe = exp j 5^ OyXy + e^X, - ^(6) J , 9 = {9v,9e)& 

\veV eeS ) 

is the Ising model. The densities are taken with respect to the reference 
measure /x, hence 

^(U) = log I / exp I V dyXy + V 6»eXe \dii \ . 

V'' \vGV eeS J J 

It is possible to describe the Ising model in a differentiable manifold, that 
is without reference to any specific chart, by saying that Eq. (1) is a special 
parameterization of the set of all strictly positive probability densities p 
such that 

logp e V = Span {l;Xy:veV;Xe:e€S). (2) 

However, the Ising model has an extra special feature, namely the so- 
called canonical statistics, i.e. the linear basis of V which is used to obtain 
the parameterization in Eq. (1), are integer valued random variables. We 
call a model of this type a lattice exponential family LEF. It is always 
possible to parameterize a LEF with nonnegative and non strictly positive 
canonical statistics. For example, in the Ising model we can use the binary 
variables Ay = (1 - Xy)/2, v e V, and A— = Ay XORA^ = Ay + Ay^ - 
AyAyj, vw ^ £, to get the same model in a different parameterization: 

p^=exp[^;9„A„+^/3eAe-Vi(^) J , (3) 

\veV e&S / 

with the obvious change of parameters ^ (3, ip ^ 'tp. It should be noted 
that Xy = (-1)^" and X^ = XyXy, = = In fact, the 

re-coding is actually the character group Z2 9 a >-> (—1)" G {+1, —1} G C. 

In Statistical Physics, a model as in Eq. (3) is called Gibbs (or 
Boltzmann-Gibbs) model. The interest of nonnegative but nonpositive 
canonical statistics appears in the discussion of the limit case where some 
of the /?'s tend to —00. In such a case a limit distribution with smaller 
support is obtained. 
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Another parameterization of interest is obtained, by taking in Eq. (3) 
the nonhnear transformation ty = e^" , v G V , tg = e^^' , e £ £ , to get the 
monomial form 

vev ees 

Let us make a second remark. The random variable logp belongs to 

the vector space V generated by the constants and the canonical statistics 
if, and only if, it is orthogonal in M*^ to each random variable K in the 
orthogonal space V"'". In other words, a density p belongs to the model of 
Eq. (3) for some 13, or to the model in Eq. (4) for some t, if, and only if, 
the equation 

0= ^logp(x)K(x)=log( I]p(x)^(-)J , (5) 

holds for all K such that 

^ Ay{x)K{x) = 0, e y, ^ Ag{x)K{x) = 0, e G f . 

By considering the positive and negative part, K = — Eq. (5) 
can be written as 

xex xex 

This argument is true for all exponential families. In particular, in the 
lattice case, it is possible to find a vector basis of the orthogonal space 
whose elements K are all integer valued. As a consequence, Eq. (4) and 
Eq. (6) are both polynomials with indeterminates p{x), x £ X , ty, v £ V, 
and tg, e G £. The binomials in Eq. (6) are the polynomial invariants of 
the LEF model. 

In the Ising model it is easy to find a linear basis of the orthogonal 
space, namely the set JT" of all interactions Xj = Ylve.i — which 

arc not included in the model itself. 

We turn now to the study of statistical models of the special monomial 
type of Eq. (4). Many cases could support this approach, but in our view 
the basic one is the following: in Eq. (2) the probability p is assumed to be 
strictly positive, while both Eq. (4) and Eq. (6) make sense when p{x) = 
at some x G X. 
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Notes 

The Ising model is named after the physicist Ernst Ising (1900-1998) and 
is the basic mathematical model for ferromagnetism. We do not discuss 
at all its applications to Statistical Physics, where in fact special cases 
are considered, see e.g.^ The unifying concept of exponential family was 
fully developed in the classical monograph by BarndorfF-Niclsen;^ a recent 
exposition of its multiple applications is the review by Wainwright and 
Jordan.^ The importance of a parameter free and geometrical approach was 
discovered by Ccncov'* and evolved into what is now called nonparametric 
Information Geometry, see the seminal papers by Phil Dawid^'^ and the 
functional version by Pistone and Sempi.^ The algebraic approach emerged 
in the 90's. It was first outlined in a monograph by Pistone, Riccomagno 
and Wynn^ and fully developed in a paper by Gciger, Meek and Sturmfels.^ 
Currently there is an extensive literature — tagged Algebraic Statistics — we 
will refer to in the following sections. 

3. A-model 

We work on a finite sample space X with reference measure /j,. We consider 
an nonnegative integer model matrix A e Z™^^' representing m + 1 ran- 
dom variables Ai, i = 0,1, ... ,m. The elements of the matrix A are denoted 
by Ai{x), i = . . .m,x G X. We assume the row Aq to be the constant 1. 
The a;-column of A, say A{x), is a multi-exponent of the monomial term 

t^(-)= . (7) 

Definition 3.1 (A- model). The monomial model of the model matrix A 
(briefly, the A-model^ is defined as follows. 

(1) The unnormalized probability densities of the A-model are of the form 

q{x;t) = t^^'=\ x&X, 

for all t € R>"'"^ such that q{-;t) is not identically zero. 

(2) The probability densities with respect to in the A-model are 

p{x; t) = q{x; t)/Z{t), Z{t) = q{x; t)n{x). 

(3) If t > 0, P = \ogt and q{x; j3) — cxp (/3 • A(x)), i.e. the interior of the 
A-model is a LEF in the parameters j5. 
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The probability density does not depend on to, so that we usually drop 
the to parameter: 

However, it is useful to keep it in the notation of the unnormalized density 
which is a projective object. 

The product • ■ • 1,^"'^^'^ is strictly positive for > 0, i = 1, . . . , m, 

and it is identically zero for = if > for all a; G A". If a 

row Ai is not strictly positive, then the unnormalized density is defined 
for all t in the face {ti = 0} of the positive quadrant M™"'"^. This face 
parameterizes an A*-model with all parameters but ti and sample space 

= {x G X: Ai{x) = 0}, being the submatrix of A obtained deleting 
the i-th row and all the columns x such that Ai[x) > 0. A similar argument 
applies to the case where Ai{x) + Aj{x) = for at least one x € X. 

Let us discuss the identifiability of the interior of an A-model by deriving 
a confounding equation. 

Proposition 3.1. Two parameter's values s,t € M™ are such that Ps — Pt 
if, and only if 

(log(i,/s,) : z = 0,l,...,m) eeo + ker^"^, eo = (1, 0, . . . , 0) . (9) 

Proof. Denote by Z the normalizing constant. Then pt = Ps if, and only 
if, 

Z(s)f^(^) = Z(f)s^("), x&X, 

hence 

m 

^(log^i - \ogSi)Ai{x) = \ogZ{t) - logZ(s), x&X. 

i=0 

If we define 8i = (\ogt, - logSi)/(logZ(t) - logZ(s)), then 5'^ A = 1. As 
the first column of ^ is 1, the first vector of the canonical basis satisfies 
Co A = 1, so that the confounding equation is Eq. (9). □ 

Let be given two matrices A e Z™"*"^'"^ and B G Z""'"^''*. When the 
interior of the ^-model does represent the same statistical model as the 

interior on the _B-model? 

Proposition 3.2. The interiors of the A-model and the B-model coincide 
if and only if RowSpan A = RowSpan B . 
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Proof. Assume that for f > and s > there is a positive constant c such 
that 

It follows that X^^o log = log c + X]"=o ^j-^j (^)- 



(10) 
□ 



It is relevant to note that the equivalence of the interiors does not imply 
the equivalence of the borders, as the following example shows. This topic 
is discussed in the next Section. 

The simplest example of ^4- model is the Binomial(n,p) with state space 
A" = {0, 1, 2, 3, ... , n}, measure ^(x) = (") , model matrix 

1 2 3 ••• n 



A = 







1 



1 1 1 1 ••• 1 
12 3 ••• n 

unnormalized density q{x] to,t\) = totf, and density ti) = /(I + ti)", 
a; = 0, 1, . . . , n and ti > 0. 

A second monomial model with the same interior has model matrix 
1 2 ••• n-1 





B = 1 
2 



1 

1 

n — 1 



1 

2 

n-2 



n - 
1 

n-1 
1 



unnormaHzed density q{x]to,ti,t2) = tatft^ ^, and density fi, ^2) = 

The Gibbs model in Eq. (3) with state space X = {+1, —1}^ has a 
model matrix whose rows are indexed by 0,V,£ and entries Ao{x) = 1, 

Ay{x) = (1 - Xy)/2, Ayyj{x) = 3/4 - Xy/4: - X^/A - XyXyj/4:. 

In some applications the statistical model is further constrained. We 
consider here two types of contrains: linear constrains on the probability 
densities and linear contrains on the parameters of the monomial model. 

In the first case a matrix C G Z'''" is given and the statistical model 
is q{x;t) = t^^^\ restricted to all t's such that '^^^x^ii^^li^'^) ~ 
i = l.....k. In the second case the parameters t are constrained by a 
linear variety. In general, the constrained statistical model is not anymore 
an 74-model. Instead, it is an instance of a curved exponential family. 



Notes 

The term A-modcl was first used in the seminal paper by Gciger, Meek and 
Sturmfels.^ It is currently of general use, but unfortunately the definition 
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has been adapted by various authors to their special needs. For example, 
the original paper assumes the column sums to be constant, which we do 
not. A further (small) issue comes from the presentation of the matrix A: in 
the statistical literature the model matrix has sample points as rows, while 
the algebraic literature takes sample points as colum,ns. The geometry of 
curved exponential families was first discussed by Efron.^° 

4. Toric ideals and the closure of the A-model 

The kernel of the ring homomorphism from Q[q{x) : x E X] to Q[to, ■ ■ ■ , tm] 
defined by q{x) t'^^^\ x £ X, is the toric ideal of A, 1(A). It is a prime 
ideal generated by binomials 

n di^)'^^"^- n «(^)''^'^ (11) 

x: fe(x)>0 x: k{x)<0 

with k G Z'^ n ker A, hence there exists a finite generating set of binomials. 

The polynomials in Eq. (11) are the polynomial invariants of the A- model 
and all its unnormalized densities belong to the intersection of the variety 
of the toric ideal with Because of the assumption Aq = 1, we have 
J2xex ^(*^) ~ t^^t binomials in Eq. (11) are homogeneous polyno- 
mials. Hence all densities pt ~ qt/Z{t) in the A-model belong to the ideal 
generated by the same binomial equations. 
In fact, more is true. 

Proposition 4.1. The intersection of the A-variety with the probability 
simplex is the closure of the A-model. 

We discuss below a slightly different version of this basic result. 

Let _B be a model matrix such that the A-model and the _B-model are 
equal in the interior of the parameter space. Each row of B belongs to the 
set Z> n RowSpan A. This set is closed under vector sum and has a unique 
minimal generating set, which is called Hilbert basis. Each vector in the 
Hilbcrt basis is nonncgativc and, because of the minimality, has at least 
one zero. Let if be a matrix with margins {1, . . . ,h} x X, whose rows are 
the vectors of the Hilbert basis. 

Proposition 4.2. 

(1) The H -model is the closure of the A-model, i.e. each density in the 
H-model is a limit of a sequence in the A-model. 

(2) Setting tj = in the H-model, we obtain a limit -model whose sup- 
port is Xj = {x G X: Hj{x) = 0}. 
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(3) The W -model on Xj is the H -model conditioned to Xj. 



It should be noted that the 7J-model in the previous proposition is 
possibly non minimal among models with the closure property in Item (1). 
A basis producing a minimal representation of all limits is called a circuit 
basis. If the Hilbert basis is boolean, then it is also a minimal description 
of the border. 

When the model is constrained, the admissible limits are obtained by 
intersecting the constrains with the faces of the nonnegative quadrant. This 
is discussed in the following examples. 



4.1. Example: the binomial 

"111111 



The integer kernel of A 



012345 



is (Q-generated by the rows of 



"1 -2 1 0" 
1-21 00 
1-210 
1 -2 1 



and the corresponding binomials are 



9(0)9(2) - q{lf, q{l)q{S) - q{2f , q{2)q{A) - q{2,f , q{3)q{5) - q{4f. 



The Hilbert basis of RowSpanA is if = 



12 3 4 5 
543210 



and hence 



q{x]to,ti,t2) = totftl 

The admissible defective supports for limits arc {0} and {5}. Assume we 
add the constrain p(0) = p(5), i.e. the constrain matrix (1, 0, 0, 0, 0, —1). In 
monomial form the constrain is titl~^ = tltl~^, i.e. ti =t2- This constrain 
happens to be a binomial, and the constrained model reduces to a single 
distribution, namely the uniform distribution. 
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4.2. Example: 3 binary identical RVs, no 3-way interaction 

Consider the sample space X = {+, — }^ and model matrix 





+++ 


-++ 


+-+ 


— h 


++- 


-+- 


H 







1 


1 


1 


1 


1 


1 


1 


1 


1 





1 





1 





1 





1 


2 








1 


1 








1 


1 


3 














1 


1 


1 


1 


12 





1 


1 








1 


1 





13 





1 





1 


1 





1 





23 








1 


1 


1 


1 









(12) 



It is a special Ising model on a complete graph on 3 vertices. The orthogonal 
space is generated by the vector of the 3-way interaction 

+++ -++ +- + — + ++- -+- + — 

X1X2X3 = [ 1 -1-1 1 -1 1 1 -1 ] . 

The Hilbert basis is given by the rows of the matrix 



H = 





+++ 


-++ 


+- + 


— + 


++- 


— 1— 


+ — 




1 


" 1 




















1 


2 














1 





1 





3 








1 











1 





4 











1 











1 


5 





1 














1 





6 

















1 





1 


7 




















1 


1 


8 


1 


1 




















9 


1 





1 

















10 


1 











1 











11 





1 





1 














12 








1 


1 














13 











1 


1 











14 





1 











1 








15 








1 








1 








16 














1 


1 









The previous matrix was computed with a symbolic software. However, 
we note that each row is obtained by taking a single 1 in the subset where 
the value of the 3-way interaction equals 1 and another one in the comple- 
mentary subset, for a total of 4 x 4 = 16 rows. A proof of the Hilbert basis 
property could be based on the minimality of the support of such vectors. 
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Wc denote by si, . . . , .sig the parameters of the Jf- model. The possible 
reduced supports of limit distributions of the >l-model are the intersections 
of the subsets of 6 zeros in each of 16 rows of H. For example, if we set 

Si = in the iJ-model, then the set {+ + +, } has zero probability 

and the limit model matrix is obtained by conditioning the A-model to the 



support set Xi, 














-++ 


+-+ 


— + 


++- 


-+- 


+- 





1 


1 


1 


1 


1 


1 


1 


1 





1 





1 





2 





1 


1 








1 


^1 = 3 











1 


1 


1 


12 


1 


1 








1 


1 


13 


1 





1 


1 





1 


23 





1 


1 


1 


1 






On the subset Xi the aliasing relation is X1X2 + XiX^ + = — 1, 

therefore one of the interactions depends on the other two interactions. The 
submatrix with one interaction's row deleted is non-singular. In conclusion, 
the limit model is the saturated model, i.e. the full simplex of probabilities 
on X\. 

We pass now to the discussion of the constrained model. The equality 
of the marginal distributions reduces to the constrain matrix 

+- + — 

1 -1 " 

-1 _ ■ 

In terms of the parameters s\,...,s\% of the iJ- model the constrains are 

S5S8S11S14 + S6S14S15S16 - S3S9S12S15 - S2S3S5S7 = , 
S5S8S11S14 + S4S11S12S13 - S2S10S13S16 - 52535557 = . 

The intersection of the previous variety with the necessary condition for a 
border case, i.e. Si • • • sie = 0, gives the equations of the constrain on the 
border. 

Notes 

The theory of toric ideals is due to Sturmfels.^^ We do not discuss here an 

important topic of this area, namely Markov Bases which were introduced 
in another seminal paper by Diaconis and Sturmfels.^^ The border of an 



C ■ 



+++ 





-++ 

1 
1 



+-+ 

-1 





++- 


-1 
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^-modcl is discussed in detail in Kahle's thesis-'^^ together with a general- 
ization to general exponential families due to Rauh, Kahle and Ay.^** Here 
we have associated the border to a special version of the A-model using an 
Hilbert basis as set of canonical statistics, an idea which is mentioned first 
in Rapallo's thesis. Proofs are published in Malago and Pistone. Hilbert 
basis computations where done using 4ti2^'' and CbCoA..^^ Another impor- 
tant topic we do not discuss here is Birch's theorem, see the exposition by 
Pachter and Sturmfels.^^ The discussion outlined in the Examples is new. 

5. Differentiation of the normalizing constant 

A key result in exponential families is the relation of the partial derivatives 
of the cumulant function with the cumulants of the canonical statistics. 
In particular, the gradient of the cumulant generating function maps the 
canonical parameters onto the interior of the convex polytope generated by 
the values of the canonical statistics. We discuss here a version of this in 
the case of A-models. 

We call design any finite set of real vectors. The image of a LEF under 
the canonical statistics is the canonical LEF. Its support is a design T) C 
Z™. In particular, the canonical version of an >l-model is supported by the 
design V e Z> whose points are the columns of the model matrix. 

The set of all polynomials which are zero on a design is the design ideal 
1{T>). The canonical A- model has the form 

m 

g(a;;t) = , x eV, ti>0, j = l,...,m, 

i=l 

with normalizing constant (partition function) 

xeTi 

In the Weyl algebra C{ti . . .td,di . . . 84) we define the operators 

tidi-Xi=diti-{l + Xi), i=l,...,m, x&V, 

where the equality follows from the commutation relation diti = 1 + Udi. 
For all a; e r> we have 

{Udi - X,) •e = di* iUt^) - (1 + xi)e = 0, 

so that tidi • = Xit'' and, by iteration, {tidi)°' •t^ = xft^, a € Z>. 
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The operator (tidi)" applied to the polynomial Z{t) G C[ti, . . . ,tm\ 
gives 

{tidir • z{t) = ^(tiSi)" • t^nix) = • 

For i ^ j we have the commutation {tidi){tjdj) = {tjdj){tidi), hence 

m rn / rn \ 

i=l xePi=l xeP \i=l / 

By dividing by the normalizing constant we obtain he following expres- 
sion for the moments: 

d m 

z{t)-' Hmr • m = z{t)-' e n(*^^^)"' • = [^"] • 

i=l xePi=l 

From the ring homomorphism 

' C[x]-^C{ti...tm,di...dm), 

we have for each polynomial / G . . . , Xm) 

A(/).Z(t) = E/Wi"MW- 

As X G T), the polynomial / is identified up to an element of the design 
ideal. The quotient ring R{xi, . . . , has a linear basis {x" : a £ M} 

of monomials called monomial basis, with N = #M = elements. 

Proposition 5.1. 

(1) Let {x": a G M}, be a monomial basis for V. Then Z{t) satisfies the 
following system of N linear non-homogeneous differential equations: 

A{x") • Z{t) = E x°'t'', a&M. 

(2) Let fa{x) be the (reduced) indicator polynomial of a € V. Then Z(t) 
satisfies the following system of N linear non-homogeneous differential 
equations: 

A{fa) . Z{t) =t% a G V. 

(3) Let g{pa- a £ V) be a polynomial in the toric ideal of the monomial 
homomorphism Pa^ t°- . Then 

g{A{fa{x))*Z{t):a£V)={). 
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In the previous theorem, if the right end sides in the first two Items are 
expressed in terms of known moments, the equations are homogeneous, e.g. 
Item 2 becomes 

A{fa)»Z{t)=p{a;t)Z{t), aeV. 

5.1. Example: 3 binary variables, no 23- and 
1 23 -interactions 

The model matrix is the same as in Eq. (12) with the 23-row deleted: 



A = 





+++ 


-++ 


+-+ 


— + 


++- 


-+- 


+ — 







1 


1 


1 


1 


1 


1 


1 


1 


1 





1 





1 





1 





1 


2 








1 


1 








1 


1 


3 














1 


1 


1 


1 


12 





1 


1 








1 


1 





13 





1 





1 


1 





1 






The orthogonal space is generated by the missing interactions XiX^ and 
X\X2Xz- Computations where done with the software CbCoA.. A monomial 
basis of the design is 

1, a;i3, Xi2,x^, X2, a;i2Xi3, X2X13 , 

and the indicator polynomial of the column A{ — h -\-Two) = 110011, is 
expressed in this monomial basis by 

. ^ 1 1 111. 

f-++(x) = -X2X13 - -a;i2a;i3 - -xi + -X3 - -X13 + 1 . 
It follows that the differential operator of Proposition 5.1(2) is 

A{f-++) = 

1/2^2^2*555 - 1/2*454*5^5 " 1/4*151 + 1/4*353 - 1/4*555 + 1 . 

Notes 

Here we use the algebraic theory of design which was presented first by 
Pistone and Wynn^° and discussed in detail in the quoted monograph.^ 
The practical interest and feasibility of the resulting computations is object 
of current research. 
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6. Markov chain, toric Markov cheiin 

We consider in this section an homogeneous irreducible Markov chain 
Xk, k = 0,1,..., with state space V, initial probability ttq, transi- 
tions Pv^w, v,w € V. Let A = {v ^ w. Pv^w > Q-.v ^ w} and C 

■ ■ Pt,_).^ > 0,1) G V}. The transitions in C are called loops. The di- 
rected graph of transitions {V] A^J C) is defined by u — > itj G ^ U £ if, and 
only if, Py^w > 0, w, w G y, and it is connected. Let w be a trajectory with 
positive probability, i.e. a path of the graph of transitions, oj = wqWi • • • w„ 
with ((x>i_i — > LUi) G AU C, i = 1, . . . ,n. The set of trajectories with n 
transitions is denoted by fin- For each oj ^ fin, the transition's count is the 
integer Fx F-matrix with elements Ny^^{uj) = J2k=ii-^k-i = v,Xk = w). 

The joint distribution up to the time n on the sample space f2„ is a 
monomial term in the ring Q[7ro(u), u G Pa, o G ^ U £], 

P„ (w) = 7ro(a;o)Pc^o^c-'i ' ' ' Pu,„.i^u,„ 

= n n p^'^'^^ (13) 

vev lec aeA 

The sparse matrix whose rows have indexes in 0, AUC. whose cohimns 
have indexes in fin, such that the column cj is 1, {(Xq{uj) = v): v ^ V}, 
{Na{co): a G Au£}, defines a toric statistical model on r2„ called toric 
Markov chain (TMC). 

The unnormalized density up to the time n of the toric model is 

=t,^t'^^^^-^=^^ n (14) 

veV oG^U£ 

For example, if the state space is F = {+1, a direct computation 
shows that the TMC is, up to a linear transformation of the canonical 
parameters, equal to the constrained model of the type of Eq. (1), namely 

n— 1 n 

log qa,p = aoXo + a'^Xt + -|- ^ l3tXt-iXt . 

t=i t=i 

The constrain is a linear constrain on the canonical parameters at = ol, 
t=l,...,n— 1. In fact, the constrained model is an exponential family. 

In the TMC the t's parameters are not required to be transition proba- 
bilities, therefore the Markov chain model is a submodel of the TMC model 
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given by linear restrictions on the t's. It follows that the MC is not an ex- 
ponential family, but it is a curved exponential family. 

More precisely, as it is seen from the product form, the TMC from time 
to time n is a special Markov process with non-homogeneous transition 
probabilities. If the transition probabilities arc homogeneous, then the TMC 
is a Markov chain. Moreover, we note that the joint distributions of the 
TMC does not form a projective system for different n's, in particular 
classical results on the mixture representation of processes whose sufficient 
statistics arc transition's counts do not apply here. 

Proposition 6.1. Let us define the vector S(v) = '^^i- v-^w€AuC^■"^w, 
V The TMC on fin is a MC if, and only if, it is constant, say S{v) = S, 
vGV. 

Proof. We denote by P„ (w) the probability of a trajectory w G ri„. A 
transition matrix P is obtained by normalizing of the t^'s, 

Pv^w = ^^^rT, S(v)=y^ty^^, {v^w)gAuC, 
^ ' wev 

and Pv^w = if (u — >■ w) ^ ^ U £. As qn{vQVi ■ ■ ■ Vn-iVn;t) = 
qn-i{voVi • ■ • Vn-i;t)ty^_^^y^, the marginal unnormalized density up to 
time (n — 1) is 

^ qn{V0Vi---Vn-lVn;t) = qn-l{V0Vl---Vn-i;t)S{Vn-l) , 

and the conditional probabilities are 

Pn {Xn = Vn\Xn-l = Vn-1 . . . Xq = Vq) = ^^7^-^^ ^ = Pv„_i^v„ ■ 

qn-lo[Vn-l) 

The marginal unnormalized density up to time (n — 2) is 

qn-iS{v„-i) = q„-2 ^ tu„_2^„„_iS'(w„_i) , 

v„-i r„_i 

and the conditional probabilities are given by 

m IV \ V V \ ty^_2—^v^_^S{Vn-l) 

IPn [Xn-l = Vn-l\ A„_2 = Vn-2 . . . Xq = Vq) = 7 —, r • 

For generic vertices v,w G V we have: 



P„ (X„ = w\ Xn-l =v) = 

\ [Xn-l = V\ Xn-1 = W) 
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hence TMC is an homogeneous MC if, and only if, 

^^"^^ - S{w) 

i.e. if and only if S{v) = S, v €V. In such a case the ty-^w is proportional 
to the transition matrix Py-^-w □ 

A second description of the relation between the MC and the TMC 
follows by writing ty^w as S{v)Py-).yj in Eq. (14). The unnormalized density 

qn{u);t) can be re-written as 

vev v^weAuC 



\vGV v^weAuc J uev 

where Ny^. = 'Yl,weo\xt{v) ^v^w is the number of exits from u, including 
transitions from u to u itself. This distribution is the distribution of a MC 
with the added weight HuGy S{v)^^^- . Or, we can say that the TMC, given 
the number of outs Ny^., v &V, is a MC. 

The normalizing constant of a TMC on a graph {V,AUC is Z{t) = 
SwGO (7(0;; t) and the normal equations for the maximum likelihood es- 
timation can be written as polynomial equations with the operator intro- 
duced in Sec. 5 as tadaZ{t) = Na{uj)Z{t), a G AD jC. The existence of a 
solution of the normal equation stems from the general theory of exponen- 
tial families if the transition's observed values are strictly positive. If it is 
not the case, we can find a border solution using the following characteri- 
zation of transition's counts. 

Proposition 6.2. An integer matrix N e Z>^^ is the transition count of 

a trajectory ui if, and only if, it is connected and ^v^w — J2w ^w^w 
is either for all v € V or is +1 for a vertex vq, —1 for a different vertex 

Vn , and zero for all v ^ vq,v„. 

An observed transition's count N{ui) is connected and defines a sub- 
graph of the model graph by the positivity condition A^a(w) > 0. It follows 
that the normal equations have a solution in a submodel. 

We have defined a toric ideal associated with the unnormalized densities 
on fin- Now we want to consider all trajectories, i.e. trajectories of any 
length n. We need first to fix a more precise language for closed trajectories. 

A closed trajectory oj = VqVi ■ ■ ■ Vn-iVo is any trajectory going from an 
initial vq back to vq; rui = voVn-i ■ ■ ■ v^vq is the reversed closed trajectory. 



19.11.2011 



G Pistone & MP Rogantin 



17 

If wc do not distinguish any initial vertex, the equivalence class of closed 
trajectories is called a trail. A closed trajectory is elementary if it has no 
proper closed sub-trajectory, i.e. if does not meet twice the same vertex 
except the initial one vq. The trail of an elementary closed path is a cycle. 
The set of cycles C is finite. A trajectory ui = uiq^i • ■ • is elementary 

if does not contain any cycle. 

Given a transition matrix P and an initial probability ttq, we compute 
the probability of any trajectory as ¥{lo) = 7ro(wo) (U.v,wev ^^-^■i^"''^"^) ■ 

The factor (jly^y^^v ^^^^'^'^^^ depend on the initial point ljq if 

the trajectory is closed; in fact in such a case the matrix N{u>) is a function 
of the trail only. 

Definition 6.1. Consider the ring k[to; ty,v €:V;ta,a €: Al^ C]. 

(1) The Markov monomial ideal is the monomial ideal generated by the 
monomials toY\^^yt'^°'^"^~^'' WaeAuC^O'"'^^'' ^ where w is a trajectory, 
w e O = U„0„. 

(2) The stationary Markov ideal is the ideal generated by the 
Markov monomial ideal and by the equations of stationarity 

St(,eout(t>) tyty^w = tw, W &V. 

(3) The ideal of closed trajectories is the monomial ideal generated by the 
monomials Hae^ur ta''^"\ w closed, in the ring k\ta: a € AiJ C]. 

Proposition 6.3. 

(1) For each trajectory oj there exist an elementary sub-trajectory cje, pos- 
sibly empty, and nonnegative integers A(c), c G C, such that N{uj) = 
N{uie) + J2cee'^i^)^i^)- matrices N{uJe) and N{c), c e C are 
boolean. IfcOe is not empty, it has the same initial and final point as oj. 

(2) The monomial ideal of closed trajectories is generated by the cycles. 
The monomials associated to a cycle are square-free. 

(3) The Markov monomial ideal is generated by the cycles and the elemen- 
tary trajectories. 

Proof. 

(1) Let a; = Wo • " • ''^n be a trajectory and consider the first closed trajec- 
tory encountered, vq ■ • ■ Vh ■ • ■ Vk{— Vh)vk+i ■ ■ ■ Vn, if any. Then ci — 
VhVh-\-i ■ ■ ■ Vk{= Vh) is an elementary closed trajectory and Wr = 
Wo • • • f/iffc+i • • • is either empty or a trajectory. Hence N{lo) = 
iV(wr) + A^(ci) and the iterations stops after a finite number of steps. 



19.11.2011 



G Pistone & MP Rogantin 



18 

(2) If uj is closed, then N{ijj) = X^^ec '^(^)-^(^)' hence YlaeAuc^^"^"^ ~ 

ricec (jlaeAuc^a"^''^^ ■ On a closed trajectory u>, the matrix of 
transition counts 

n 

Wv,w{i^)]v,wev = ^{Xk-i{u)) = v,Xk{u)) = w) 

k=l 

has row sums equal to column sums, i.e. there are as many ins as outs 
at each vertex, see Prop. 6.2. 

(3) If the trajectory is closed, then we can apply the previous item. Oth- 
erwise uj and We start at the same vertex and 

to n 4''°^"^^''^ n = to n 4""°^"^^^"^ n 

vev aeAuc vev aeAuc ^ 

In a sense, the joint distribution of each trajectory is characterized by 
the Markov monomial ideal. 

Proposition 6.4. Let P be an irreducible Markov matrix. The distribution 
of the stationary Markov chain is a function of transition's monomials of 
the cycles. 

Proof. Consider v ^ w with Pv^w > and define the sets Clk{w v) 

of the trajectories from w to u of length fc, k = 1,2,.... For each uj G 
ilk{w v), the trajectory {v w)oj = v{uio = w) ■ ■ • (ujk = v) is closed. 
Because of the Markov property and irreducibility, we have the Cesaro limit 
Tr{v) = lim/j_>oo PwXv > 0, so that 

lim > ¥{{v^ w)ui) /^{v) 

ujQ^{k^w^v) 

lim Py^w P (wlwo = w) = 

P.a^^ lim P^\^ = T:{v)P.a^^ . 

k^oo 

As F{{v w)uj) /tt{v) is a product of cycle's monomials because of Prop. 
6.3(2), all values P{v,w) = iT{v)Py^yj of the 2-step joint distribution de- 
pend on the values of the cycle's monomials. □ 

We note that the values of cycle's monomials are dependent. For exam- 
ple, in the complete graph with 4 vertices there are 20 cycles (including the 
4 loops), while the number of degrees of freedom for a generic transition 
probability on 4 points is 12. 
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Notes 



The name TMC was used first by Pachtcr and Sturmfels [19, Ch. 1 Statis- 
tics]. See also Hare and Takemura^^'^^ for slightly different definitions. The 
representation of processes whose sufficient statistics are transition's count 
is a version of de Finctti exchangeability, see Freedman.^^ Prop. 6.2 is 
proved Grande. A TMC is a particular case of a graphical model in the 
sense of Lauritzen.^^ A related issue is the characterization of the distri- 
bution of stationary Markov chains via a mixture on monomials on cycles 
that was obtained by McQueen,^^ cfr. Kalpazidou.^^ 



7. Reversible Markov chains 

A transition matrix Py^w, v,w G V, satisfies the detailed balance (DB) 
condition if k{v)Pv^w = k(w)P^_>„, v,w € V for some strictly positive 
k{v) > 0, w € y. As a consequence, 7r(u) oc k(v) is an invariant probability 
and the stationary Markov chain (Xn)nei> has reversible two-step joint 
distribution P (X„ = v, X„+i = w) —¥ {Xn = w, X„+i =v),v,'wGV,n> 
0. The distribution of the Markov chain is uniquely parameterized by its 
symmetric two-step joint distribution. 

The DB assumption is trivially satisfied for v = w and moreover Py^w > 
if, and only if, Pyj->-v > 0. Given an undirected connected graph Q = 
{V, £) with no loops and the directed graph (F, A) whose arcs are the two 
directions of each edge, we consider here all transition probabilities such 
that Pv^w = 0, V ^ w, if, and only if, vw ^ £. Let C denote loops with 
positive transition Pv^v and tt the invariant probability. 

For each trajectory lu = vq ■ ■ ■ ?'„ in the graph CJ let rw = w„ • • • vq be 
the reversed trajectory. The reversed probability is (w) = V{roj). From 
Eq. (13) we compute the likelihood 



Pr(w) 
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the log-likelihood 
log 



log {n{v)) {{Xo =v)- (X„ =v)) + Y, log Na . 

vev aeA ^■^™^ 

because of the stationarity of tt, the divergence is 

D{F\\¥r) = 

^ log (7r(^;)) Ep [{Xoiio) = v) - =v)] + Y^ log (-^ 



= n 



Ep [iVa] 



* to— >w ' 



(u->-u))e^ 



= n 

where P{v, w) is the two-step joint distribution. As the divergence is zero 
if, and only if, the probabilities are equal, the DB condition is equivalent 
to P = Pr-. The last statement could be easily derived otherwise, but the 
computation of the divergence has an independent interest, e.g. it shows 
the linear dependence on n of the divergence. 

Let u) a closed trajectory and let rui its reversed trajectory. In the pre- 
vious section we have shown that the distribution of the Markov chain is 
uniquely characterized by the initial probability, the loop transitions Pv^v , 
V GV, and the monomials Hae^ = pi^^ fQj- ga,ch closed and loop-free 

OJ. 

In the case of a reversible chain, these monomials are invariant under 

the reversion. 

Proposition 7.1 (Kolmogorov). Let the Markov chain {Xn)^^^^^ have 
its transitions supported by the connected graph Q . The MC is reversible if, 
and only if, = P'"'^ for all closed trajectory co. 

This suggests the following definition. 

Definition 7.1. The Kolmogorov 's ideal or K-ideal of the graph Q is the 
ideal generated by the binomials P" — P''", where w is a closed trajectory. 

The generation of the ideal of closed trajectory by the cycles of Prop. 
6.3 together with the sygyzy characterization of Grobnber basis on an ideal 
lead to the following proposition. 
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Proposition 7.2. 

(1) The K-ideal is generated by the set of binomials — P™, where co is 
cycle. 

(2) The binomials p'^— p'""^ where co is any cycle, form a reduced universal 
Grobner basis of the K-ideal. 

In the previous sections we have constructed a process leading from a 
monomial parameterization of a statistical model to a binomial basis of its 
toric ideal. Here the process is reversed, in that we are given a binomial 
ideal and would like to show it is, in fact, a toric ideal. This program will 
eventually produce a parameterization of reversible Markov transitions in 
monomial form. Wc observe that the variety of the K-idcal is not satisfied by 
probabilities, but by transition probabilities. We had a similar issue when 
describing the difference between TMCs and MCs. 

The proof that the K-ideal is a toric ideal is based on standard construct 
of graph theory that we are going to review now. Let C be the set of cycles 
of A. For each cycle co € C we define the cycle vector of w to be z{cj) = 
{za{co) : a e A), where 



+1 if a is an arc of co, 
—1 if r(a) is an arc of co, 
otherwise. 



The cycle space is the vector space generated in R-^ by the cycle vectors. 

For each proper subset B of the set of vertices, 9 B C V, say B € S, 
we define the cocycle vector of B to be u{B) = {ua{B) : a € A), with 

+1 if a exits from B, 
Ua{B) = — 1 if a enters into B, 
otherwise. 

The cocycle vectors generate the cocycle space. The cycle space and the 
cocycle space orthogonally split M-^. We denote by 'Z{A) the integer- valued 
cycle vectors. We will see below how this integer vectors are related to 
transition's counts. 

The model matrix of the toric model we are going to produce for the 
K-ideal is the matrix with row's indices in £ U S and column's indexes in 
A. The element in position (e, a) of the {£ x ^)-block is one if, and only 
if, the arc a belongs to the edge e. The element in position {B, a) of the 
(iS X ^)-block is UB{a). We call this model matrix the cocycle matrix. It 
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follows that the cocycle space is the kernel of the cocycle matrix and Z(^) 
is its lattice. 

The following definition provides a generalization of the Hilbert basis 
wc already used in our discussion of the border of an A-model. It is needed 
below to provide a more precise version of the cycle decomposition of a 

closed trajectory. 

Definition 7.2. 

(1) Given two integer vectors 2:1,^2 G TA , we say z\ is conformal to Z2, 
Zi Q Z2, if the component- wise product is nonnegative and \zi\ < \z2\ 
component- wise, i.e. z\^aZ2,a > and \zi^a\ < \z2,a\ for all a G A. 

(2) A Graver basis of Z{A) is the set of the minimal elements with respect 
to the conformity partial order C. 

Proposition 7.3. 

(1) For each integer cycle vector z G Z(^), z = J2uec there exist 
cycles u)i, . . . ,uj„ G C and positive integers a^tJi), . . . , a(uj„), such that 
z~^ > z~^{u!i), z^ > z^{uii), i = 1, . . . ,n and z = a{uii)z{oJi). 

(2) The set {z{uj) : oj €C} is a Graver basis of'L{A). 

From the previous proposition follows the result on the K-ideal. 

Proposition 7.4. The K-ideal is the toric ideal of the cocycle matrix. 

In fact, the binomials P'^ — P*""^, w G C, form a Graver basis of the K-ideal. 
The cocycle matrix has negative entries and it could easily modified to a 
matrix with nonnegative entries as we have done in the discussion of the 
Ising model. 

The previous algebraic statement is rephrased in statistical terms as 
follows. 

(1) The strictly positive reversible transition probabilities on {y,A) are 
given by 

Pv^w = s(i;,w) JJt^"^™^^^ 

B 

= s{v,w) 11*^11 ' 

B: veB,w(B B: weB,v(B 

where s{v,w) = s{w,v) > 0, > 0. 

(2) The first set of parameters, s{v,w), is a function of the edge vw G £. 
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(3) The second set of parameters, ts, B € S, represents the deviation from 
symmetry. It is not identifiable because the full set of cocycle vectors 
Ub, B G S, is not linearly independent. 

(4) The parametrization can be used to derive an explicit form of the in- 
variant probability. 

The following proposition is a summary of all results. 
Proposition 7.5. Consider the strictly non-zero points on the K-variety. 

(1) The symmetric parameters s{e), e & £, are uniquely determined. The 
parameters ts, B G S are confounded by the {SxA)-block of the cocycle 
matrix. 

(2) An identifiable parametrization is obtained by taking a subset of pa- 
rameters corresponding to linearly independent rows, denoted by ts, 
BgT,TcS: 

Pv^w = S{V, W) Y[ tB Yi • 

BeS: veB,w(B BeS: wGB,v^B 

(3) The detailed balance equations, K{v)Py^yj = k{w)Pw^v, ire verified if, 
and only if 

k{v) oc n • 

B: veS 

It is possible to give an algebraic form of the original Kolmogorov state- 
ment on the equivalence of detailed balance with equality of transitions on 
closed trajectories in the form of a statement on elimination ideals. 

Definition 7.3. The detailed balance ideal is the ideal 

Ideal I k{v) — 1, k(i;)P„_>^ — {v ^ w) € A\ 

\vev J 

in the ring Q[«;(v), v gV; Py^w, {v ^ w) G A]. 
Proposition 7.6. 

(1) The matrix [Pv^w]y^yj^_/^ is a point of the variety of the K-ideal if and 
only if there exists k = {k{v) : v gV) such that {k, P) belongs to the 

variety of the detailed balance ideal. 

(2) The detailed balance ideal is a toric ideal. 

(3) The K-ideal is the K-elimination ideal of the detailed balance ideal. 
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By combining the monomial representation of the transitions and the 
monomial representation of the invariant probability, we obtain a classical 
parameterization of reversible transitions in the form 

Pv^w = s{v,w)K{wy^^K{vy^^^ , 

together with the constrain 

This form is well known in the literature on the Hastings-Metropolis simu- 
lation algorithm, where we are given an unnormalized positive probability 
K and a transition Qv^w > if (w — s- w) € A. We are required to pro- 
duce a new transition Py^w = Qv->-wCi{v, w) such that P is reversible with 
invariant probability k and < a{v, w) <1. We have 

and moreover we want 

a{v w) = ■^(^'^)^('^)^^^ < I 

Proposition 7.7. Let Q be a probability onVxV, strictly positive on £, 
and letiT{x) = ^yQ{x,y). If f :]0, l[x]0, 1[— >-]0, 1[ is a symmetric function 
such that f{u, v) <u Av then 



P{x,y) 



f{Q{x, y), Q{y, x)), {x, y} G £, 

'^{x)-Y.y:y^^P{x,y), x = y, 

otherwise , 



is a 2-reversible probability on £ such that ■jt{x) = J2y P{x, y), positive if Q 
is positive. 

The proposition applies to various cases of interes, e.g. f{u,v) = uAv, 
f{u, v) = uvj (u + /(m, v) = uv. 



Notes 

A recent exposition of the theory of reversible MCs appears in the lecture 
notes by Aldous and Fill.^^ We use standard results in graph theory, see 
e.g. the monograph by BoUobas.^^ Here we mainly follow our paper. The 
application to simulation is discussed e.g. in the monograph by Liu.^^ 
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Discussion 

The basic notions in Probability and Mathematical Statistics arc those of 
independence and conditioning, which in turn are expressed by product of 
probability and conditional probability. It is not by chance that almost all 
classical statistical models have a product form and that the logarithmic 
transformations are a key ingredient of computation in Statistics. Modern 
Combinatorial and Computational Commutative Algebra have provided a 
unifying framework for the diverse instances of such basic structures. In 
particular, the notion of toric ideal captures the essentials of what we have 
called Lattice Exponential Families, i.e. the discrete case that is rife in 
Applied Statistics and Statistical Physics. We have presented an overview 
of the algebraic theory of statistical models that fall under the scheme, 
together with an algebraic discussion of their limit cases and their differ- 
entiation. The application to Markov chains is an expansion of these ideas 
to objects that do not belong to a probability simplex. This is actually a 
special case of a more general interesting theory, namely Baycs networks. 
It is likely to expect applications to bayesian statistics, a topic we have not 
touched upon. 

We thank the organizers of the Osaka conference, especially professor 
Takayuki Hibi for providing an ideal place to discuss these, and related, 
ideas. We thank Paolo Baldi, Francesco Grande, Luigi Malago, Fabio Ra- 
pallo for useful discussions while this paper was in preparation. 
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